log 2 2
On the Sparsity of the Strong Lottery Ticket Hypothesis
Considerable research efforts have recently been made to show that a random neural network N contains subnetworks capable of accurately approximating any given neural network that is sufficiently smaller than N, without any training. This line of research, known as the Strong Lottery Ticket Hypothesis (SL TH), was originally motivated by the weaker Lottery Ticket Hypothesis, which states that a sufficiently large random neural network N contains sparse subnetworks that can be trained efficiently to achieve performance comparable to that of training the entire network N . Despite its original motivation, results on the SL TH have so far not provided any guarantee on the size of subnetworks. Such limitation is due to the nature of the main technical tool leveraged by these results, the Random Subset Sum (RSS) Problem. Informally, the RSS Problem asks how large a random i.i.d.
- Europe > France (0.14)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Contests & Prizes (1.00)
- Research Report > Experimental Study (0.93)
- Workflow (0.69)
- Europe > France (0.14)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Research Report > Experimental Study (0.93)
- Workflow (0.69)
Balancing Interpretability and Performance in Reinforcement Learning: An Adaptive Spectral Based Linear Approach
Yi, Qianxin, Lin, Shao-Bo, Fan, Jun, Wang, Yao
Reinforcement learning (RL) has been widely applied to sequential decision making, where interpretability and performance are both critical for practical adoption. Current approaches typically focus on performance and rely on post hoc explanations to account for interpretability. Different from these approaches, we focus on designing an interpretability-oriented yet performance-enhanced RL approach. Specifically, we propose a spectral based linear RL method that extends the ridge regression-based approach through a spectral filter function. The proposed method clarifies the role of regularization in controlling estimation error and further enables the design of an adaptive regularization parameter selection strategy guided by the bias-variance trade-off principle. Theoretical analysis establishes near-optimal bounds for both parameter estimation and generalization error. Extensive experiments on simulated environments and real-world datasets from Kuaishou and Taobao demonstrate that our method either outperforms or matches existing baselines in decision quality. We also conduct interpretability analyses to illustrate how the learned policies make decisions, thereby enhancing user trust. These results highlight the potential of our approach to bridge the gap between RL theory and practical decision making, providing interpretability, accuracy, and adaptability in management contexts.
- North America > United States (0.27)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Hong Kong > Kowloon (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)
Learning Curves of Stochastic Gradient Descent in Kernel Regression
Zhang, Haihan, Lin, Weicheng, Liu, Yuanshi, Fang, Cong
Non-parametric least-squares regression within the RKHS framework represents a cornerstone of statistical learning theory. One mainstream method to solve the problem is kernel ridge regression (KRR) with optimality analysis [Caponnetto and De Vito, 2007, Smale and Zhou, 2007, Zhang et al., 2024b]. Recent years have witnessed a renaissance of interest in kernel methods driven by the neural tangent kernel (NTK) theory [Jacot et al., 2018, Arora et al., 2019], which states that sufficiently wide neural networks, under specific initialization, can be well approximated by a deterministic kernel model derived from the network architecture. Though deep learning often operates in regimes beyond the traditional statistical mindset, recent advances demonstrate that these generalization mysteries are not peculiar to neural networks and the phenomena are also present in kernel regression, particularly in the high-dimensional regime [Ghorbani et al., 2021, Liang and Rakhlin, 2020, Zhang et al., 2024c]. Substantial studies have been made in the related regimes for kernel ridge or ridgeless methods. For instance, Liang and Rakhlin [2020] demonstrates the existence of benign overfitting for ridgeless regression, a phenomenon where the model interpolates data yet still generalizes well.
Learning Operators by Regularized Stochastic Gradient Descent with Operator-valued Kernels
This paper investigates regularized stochastic gradient descent (SGD) algorithms for estimating nonlinear operators from a Polish space to a separable Hilbert space. We assume that the regression operator lies in a vector-valued reproducing kernel Hilbert space induced by an operator-valued kernel. Two significant settings are considered: an online setting with polynomially decaying step sizes and regularization parameters, and a finite-horizon setting with constant step sizes and regularization parameters. We introduce regularity conditions on the structure and smoothness of the target operator and the input random variables. Under these conditions, we provide a dimension-free convergence analysis for the prediction and estimation errors, deriving both expectation and high-probability error bounds. Our analysis demonstrates that these convergence rates are nearly optimal. Furthermore, we present a new technique for deriving bounds with high probability for general SGD schemes, which also ensures almost-sure convergence. Finally, we discuss potential extensions to more general operator-valued kernels and the encoder-decoder framework.
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
On the Sparsity of the Strong Lottery Ticket Hypothesis
Natale, Emanuele, Ferre', Davide, Giambartolomei, Giordano, Giroire, Frédéric, Mallmann-Trenn, Frederik
Considerable research efforts have recently been made to show that a random neural network $N$ contains subnetworks capable of accurately approximating any given neural network that is sufficiently smaller than $N$, without any training. This line of research, known as the Strong Lottery Ticket Hypothesis (SLTH), was originally motivated by the weaker Lottery Ticket Hypothesis, which states that a sufficiently large random neural network $N$ contains \emph{sparse} subnetworks that can be trained efficiently to achieve performance comparable to that of training the entire network $N$. Despite its original motivation, results on the SLTH have so far not provided any guarantee on the size of subnetworks. Such limitation is due to the nature of the main technical tool leveraged by these results, the Random Subset Sum (RSS) Problem. Informally, the RSS Problem asks how large a random i.i.d. sample $\Omega$ should be so that we are able to approximate any number in $[-1,1]$, up to an error of $ \epsilon$, as the sum of a suitable subset of $\Omega$. We provide the first proof of the SLTH in classical settings, such as dense and equivariant networks, with guarantees on the sparsity of the subnetworks. Central to our results, is the proof of an essentially tight bound on the Random Fixed-Size Subset Sum Problem (RFSS), a variant of the RSS Problem in which we only ask for subsets of a given size, which is of independent interest.
- Europe > France (0.14)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Contests & Prizes (1.00)
- Workflow (0.69)
- Research Report > New Finding (0.48)
Random feature approximation for general spectral methods
Random feature approximation is arguably one of the most popular techniques to speed up kernel methods in large scale algorithms and provides a theoretical approach to the analysis of deep neural networks. We analyze generalization properties for a large class of spectral regularization methods combined with random features, containing kernel methods with implicit regularization such as gradient descent or explicit methods like Tikhonov regularization. For our estimators we obtain optimal learning rates over regularity classes (even for classes that are not included in the reproducing kernel Hilbert space), which are defined through appropriate source conditions. This improves or completes previous results obtained in related settings for specific kernel algorithms.
Best Arm Identification with Safety Constraints
Wang, Zhenlin, Wagenmaker, Andrew, Jamieson, Kevin
The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems, yet it fails to capture the fact that in the real-world, safety constraints often must be met while learning. In this work we study the question of best-arm identification in safety-critical settings, where the goal of the agent is to find the best safe option out of many, while exploring in a way that guarantees certain, initially unknown safety constraints are met. We first analyze this problem in the setting where the reward and safety constraint takes a linear structure, and show nearly matching upper and lower bounds. We then analyze a much more general version of the problem where we only assume the reward and safety constraint can be modeled by monotonic functions, and propose an algorithm in this setting which is guaranteed to learn safely. We conclude with experimental results demonstrating the effectiveness of our approaches in scenarios such as safely identifying the best drug out of many in order to treat an illness.
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.46)
On formal concepts of random formal contexts
In formal concept analysis, it is well-known that the number of formal concepts can be exponential in the worst case. To analyze the average case, we introduce a probabilistic model for random formal contexts and prove that the average number of formal concepts has a superpolynomial asymptotic lower bound.
- North America > United States > New York (0.04)
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- (2 more...)